24 research outputs found

    A simultaneous spam and phishing attack detection framework for short message service based on text mining approach

    Get PDF
    Short Messaging Service (SMS) is one type of many communication mediums that are used by scammers to send persuasive messages that will attract unwary recipients. In Malaysia, most sectors such as telecommunication, banking, government, healthcare, and private have taken the initiative to educate their clients about SMS scams. Unfortunately, many people still fall victim. Within the field of SMS detection, only the framework for a single attack detection for Spam has been studied. Phishing has never been studied. Existing detection frameworks are not suited to detect SMS Phishing because these attacks have their own specific behaviour and characteristic words. This gives rise to the need of producing a framework that is able to detect both attacks at the same time. This thesis addresses SMS Spam and Phishing attack detection framework development. 3 modules can be found in this framework, of which are Data Collection, Attack Profiling and Text Mining respectively. For Module 1, the data sets used in this research are from the UCI Machine Learning Repository, the Dublin Institute of Technology (DIT), British English SMS and Malay SMS. The Phishing Rule-Based algorithm is used to extract SMS Phishing. For Module 2, the SMS Attack Profiling algorithm is used in order to produce SMS Spam and Phishing words. The Text Mining module consists of several phases such as Tokenization, Lemmatization, Feature Selection and Classifier. These phases are done with the use of Rapidminer and the Weka data mining tool. Three (3) types of features are used in this framework, which are the Generic Features, Payload Features and Hybrid Features. All of these features are examined and the resulting performance metric used to compare the results is the rate of True Positive (TP) and Accuracy (A). There are four (4) set of results that were successfully obtained from this research. The first result shows that the extraction of SMS Phishing from the SMS Spam class contributes to four (4) enhanced datasets of the UCI Machine Learning Repository, the Dublin Institute of Technology (DIT), British English SMS and Malay SMS. The second results are the SMS Spam and Phishing attack profiling from the enhance UCI Machine Learning Repository, the Dublin Institute of Technology (DIT), British English SMS and Malay SMS. The third and fourth results are obtained from Feature Selection and Classifier phase where Eighty (80) experiments were done to examine the Generic Feature, Payload Features and Hybrid Features. There are five (5) Classification techniques used such as Naive Bayes, K-NN, Decision Tree, Random Tree and Decision Stump. The result of Hybrid Feature accuracy using Rapidminer and Naive Bayes technique is 77.47%, for K-NN: 78.56%, Decision Tree: 57.16%, Random Tree: 57.24% and Decision Stump: 57.16%. Meanwhile, by using Weka the Naive Bayes accuracy rate get 71.45%, K-NN: 81.64%, Decision Tree: 57.10%, Random Tree: 70.64% and Decision Stump: 60.19%. The experiments done using Rapidminer and Weka data mining tool because this is the first survey to detect SMS Spam and Phishing attack at the same time and the results are acceptable. Additionally, the proposed framework also can detect the attack simultaneously using text mining approaches

    Customer profiling using classification approach for bank telemarketing

    Get PDF
    Telemarketing is a type of direct marketing where a salesperson contacts the customers to sell products or services over the phone. The database of prospective customers comes from direct marketing database. It is important for the company to predict the set of customers with highest probability to accept the sales or offer based on their personal characteristics or behaviour during shopping. Recently, companies have started to resort to data mining approaches for customer profiling. This project focuses on helping banks to increase the accuracy of their customer profiling through classification as well as identifying a group of customers who have a high probability to subscribe to a long-term deposit. In the experiments, three classification algorithms are used, which are Naïve Bayes, Random Forest, and Decision Tree. The experiments measured accuracy percentage, precision and recall rates and showed that classification is useful for predicting customer profiles and increasing telemarketing sales

    A Comparison of Sensitive Information Detection Framework using LSTM and RNN Techniques

    Get PDF
    Sensitive information is meant to be stored securely to avoid any data breach. Hence this research emphasized on whether or not Long Short Term Memory and Recurrent Neural Network is suitable for sensitive information detection framework since performance analysis on this method are not common. In this research, the objectives are to design Sensitive Information Detection Framework using Deep Learning techniques, to detect sensitive information using LSTM and RNN techniques and to test and validate the sensitive information detection framework performance. This research uses raw data set given from Book Hack Enterprise company and by using Weka Explorer software tool to go through 5 phases in the proposed framework, which are Dataset, Preprocessing, Feature Extraction, Classification Algorithm and Performance Evaluation. This research evaluates the model performance based on Accuracy, Precision, Recall and F1-score

    Machine Learning-Based Distributed Denial of Service Attack Detection on Intrusion Detection System Regarding to Feature Selection

    Get PDF
    Distributed Service Denial (DDoS) is a type of network attack, which each year increases in volume and intensity.  DDoS attacks also form part of the major types of cyber security threats so far. Early detection plays a key role in avoiding the catastrophic effects on server infrastructure from DDoS attacks. Detection techniques in the traditional Intrusion Detection System (IDS) are far from perfect compared to a number of modern techniques and tools used by attackers, because the traditional IDS only uses signature-based detection or anomaly-based detection models and causes a lot of false positive flags, since the flow of computer network data packets has complex properties in terms of both size and source. Based on the  deficiency in the ordinary IDS, this study aims to detect DDoS attacks by using machine learning techniques to enhance IDS policy development.  According to the experiment the selection of features plays an important role in the precision of the detection results and in the performance of machine learning in classification problems. The combination of seven key selected dataset features used as an input neural network classifier in this study provides the highest accuracy value at 97.76%

    You tube spam comment detection using support vector machine and k–nearest neighbor

    Get PDF
    Social networking such as YouTube, Facebook and others are very popular nowadays. The best thing about YouTube is user can subscribe also giving opinion on the comment section. However, this attract the spammer by spamming the comments on that videos. Thus, this study develop a YouTube detection framework by using Support Vector Machine (SVM) and K-Nearest Neighbor (k-NN). There are five (5) phases involved in this research such as Data Collection, Pre-processing, Feature Selection, Classification and Detection. The experiments is done by using Weka and RapidMiner. The accuracy result of SVM and KNN by using both machine learning tools show good accuracy result. Others solution to avoid spam attack is trying not to click the link on comments to avoid any problem

    Feature Selection to Enhance Phishing Website Detection Based On URL Using Machine Learning Techniques

    Get PDF
    The detection of phishing websites based on machine learning has gained much attention due to its ability to detect newly generated phishing URLs. To detect phishing websites, most techniques combine URLs, web page content, and external features. However, the content of the web page and external features are time-consuming, require large computing power, and are not suitable for resource-constrained devices. To overcome this problem, this study applies feature selection techniques based on the URL to improve the detection process. The methodology for this study consists of seven stages, including data preparation, preprocessing, splitting the dataset into training and validation, feature selection, 10-fold cross-validation, validating the model, and finally performance evaluation. Two public datasets were used to validate the method. TreeSHAP and Information Gain were used to rank features and select the top 10, 15, and 20. These features are fed into three machine learning classifiers which are Naïve Bayes, Random Forest, and XGBoost. Their performance is evaluated based on accuracy, precision, and recall. As a result, the features ranked by TreeSHAP contributed most to improving detection accuracy. The highest accuracy of 98.59 percent was achieved by XGBoost for the first dataset with 15 features. For the second dataset, the highest accuracy is 90.21 percent using 20 features and Random Forest. As for Naïve Bayes, the highest accuracy recorded is 98.49 percent using the first dataset

    Feature Selection of Distributed Denial of Service (DDos) IoT Bot Attack Detection Using Machine Learning Techniques

    Get PDF
    Distributed Denial of Service (DDoS) attack can be made through numerous medium and became the one of the biggest threats for computer security. One of the most effective approaches are to develop an algorithm using Machine Learning (ML). However, low accuracy of DDoS because of feature selection classifier and time-consuming detection. This research focusses on the features selection of DDoS IoT bot attack detection using ML techniques. Two datasets from NetFlow which are NF_ToN_IoT and NF_BoT_IoT are manipulated with 2 attributes selection which are Information Gain and Gain Ratio and ranked using Ranker algorithm. These datasets are then tested using four different algorithm such as Naïve Bayes (NB). K-Nearest Neighbor (KNN), Decision Table (DT) and Random Forest (RF). The results then compared using confusion matrix evaluation Accuracy, True Positive, True Negative, Precision and Recall. The result from two datasets is selected by Top 4, Top 8 and Top 12 features selection. The best overall classifier is Naïve Bayes with the accuracy of 97.506% and 90.67% for both dataset NF_ToN_IoT and NF_BoT_IoT.&nbsp

    Classification of metamorphic virus using n-grams signatures

    Get PDF
    Metamorphic virus has a capability to change, translate, and rewrite its own code once infected the system to bypass detection. The computer system then can be seriously damage by this undetected metamorphic virus. Due to this, it is very vital to design a metamorphic virus classification model that can detect this virus. This paper focused on detection of metamorphic virus using Term Frequency Inverse Document Frequency (TF-IDF) technique. This research was conducted using Second Generation virus dataset. The first step is the classification model to cluster the metamorphic virus using TF-IDF technique. Then, the virus cluster is evaluated using Naïve Bayes algorithm in terms of accuracy using performance metric. The types of virus classes and features are extracted from bi-gram assembly language. The result shows that the proposed model was able to classify metamorphic virus using TF-IDF with optimal number of virus class with average accuracy of 94.2%

    Network monitoring system to detect unauthorized connection

    Get PDF
    The Network Monitoring System to Detect Unauthorized Connection is a network analytic tool that use to review local area network usage. The main purpose of the application is monitoring the internet protocol traffic between local area network and Internet. In addition, this system aimed to detect unauthorized Internet Protocol addresses that are inside the network range. It also can prevent network intruders from Local Area Network connection (LAN). It is a computerized system that complete with element of confidentiality, integrity and availability. The system was built using waterfall methodology that begins with system analysis, design, implementation, testing, installation and maintenance. The system is using Visual Studio 2013 with SQL Server as server operations. There are ten modules in this system which are user main page, register admin module, register staff module, login admin module, login staff module, admin menu module, staff menu module, scan view module, status view module and report module. There are about 30 respondents who agreed and satisfied with the system. As a result, this system was successfully built to detect and block the unauthorized access in the network

    IMPROVING SECURITY AND PRIVACY REQUIREMENT FOR BUSINESS REGISTRATION SYSTEM (BRS)

    Get PDF
    This paper explains the security and privacy requirement in Business Registration System (BRS) hosted at a government institution. These requirements will be used in an improvement for the system security and privacy requirement in BRS
    corecore